NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PIM GPT a hybrid process in memory accelerator for autoregressive transformers

https://doi.org/10.1038/s44335-024-00004-2

Wu, Yuting; Wang, Ziyu; Lu, Wei D (December 2024, npj Unconventional Computing)

Abstract Decoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a Process-in-memory (PIM) GPT accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication are supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism. Overall, PIM-GPT achieves 41 − 137 × , 631 − 1074 × speedup and 123 − 383 × , 320 − 602 × energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters.
more » « less
Full Text Available
Genetically Encoded Fluorogenic DNA Aptamers for Imaging Metabolite in Living Cells

https://doi.org/10.1021/jacs.4c09855

Wu, Yuting; Kong, Wentao; Van_Stappen, Jacqueline; Kong, Linggen; Huang, Zhimei; Yang, Zhenglin; Kuo, Yu-An; Chen, Yuan-I; He, Yujie; Yeh, Hsin-Chih; et al (January 2025, Journal of the American Chemical Society)

Genetically encoded fluorescent protein and fluorogenic RNA sensors are indispensable tools for imaging biomolecules in cells. To expand the toolboxes and improve the generalizability and stability of this type of sensor, we report herein a genetically encoded fluorogenic DNA aptamer (GEFDA) sensor by linking a fluorogenic DNA aptamer for dimethylindole red with an ATP aptamer. The design enhances red fluorescence by 4-fold at 650 nm in the presence of ATP. Additionally, upon dimerization, it improves the signal-to-noise ratio by 2–3 folds. We further integrated the design into a plasmid to create a GEFDA sensor for sensing ATP in live bacterial and mammalian cells. This work expanded genetically encoded sensors by employing fluorogenic DNA aptamers, which offer enhanced stability over fluorogenic proteins and RNAs, providing a novel tool for real-time monitoring of an even broader range of small molecular metabolites in biological systems.
more » « less
Full Text Available
TT-CIM: Tensor Train Decomposition for Neural Network in RRAM-Based Compute-in-Memory Systems

https://doi.org/10.1109/TCSI.2023.3344550

Meng, Fan-Hsuan; Wu, Yuting; Zhang, Zhengya; Lu, Wei D (March 2024, IEEE Transactions on Circuits and Systems I: Regular Papers)

Full Text Available
PowerGAN: A Machine Learning Approach for Power Side‐Channel Attack on Compute‐in‐Memory Accelerators

https://doi.org/10.1002/aisy.202300313

Wang, Ziyu; Wu, Yuting; Park, Yongmo; Yoo, Sangmin; Wang, Xinxin; Eshraghian, Jason K; Lu, Wei D (December 2023, Advanced Intelligent Systems)

Analog compute‐in‐memory (CIM) systems are promising candidates for deep neural network (DNN) inference acceleration. However, as the use of DNNs expands, protecting user input privacy has become increasingly important. Herein, a potential security vulnerability is identified wherein an adversary can reconstruct the user's private input data from a power side‐channel attack even without knowledge of the stored DNN model. An attack approach using a generative adversarial network is developed to achieve high‐quality data reconstruction from power leakage measurements. The analyses show that the attack methodology is effective in reconstructing user input data from power leakage of the analog CIM accelerator, even at large noise levels and after countermeasures. To demonstrate the efficacy of the proposed approach, an example of CIM inference of U‐Net for brain tumor detection is attacked, and the original magnetic resonance imaging medical images can be successfully reconstructed even at a noise level of 20% standard deviation of the maximum power signal value. This study highlights a potential security vulnerability in emerging analog CIM accelerators and raises awareness of needed safety features to protect user privacy in such systems.
more » « less
Full Text Available
Bulk‐Switching Memristor‐Based Compute‐In‐Memory Module for Deep Neural Network Training

https://doi.org/10.1002/adma.202305465

Wu, Yuting; Wang, Qiwen; Wang, Ziyu; Wang, Xinxin; Ayyagari, Buvna; Krishnan, Siddarth; Chudzik, Michael; Lu, Wei D (November 2023, Advanced Materials)

Abstract The constant drive to achieve higher performance in deep neural networks (DNNs) has led to the proliferation of very large models. Model training, however, requires intensive computation time and energy. Memristor‐based compute‐in‐memory (CIM) modules can perform vector‐matrix multiplication (VMM) in place and in parallel, and have shown great promises in DNN inference applications. However, CIM‐based model training faces challenges due to non‐linear weight updates, device variations, and low‐precision. In this work, a mixed‐precision training scheme is experimentally implemented to mitigate these effects using a bulk‐switching memristor‐based CIM module. Low‐precision CIM modules are used to accelerate the expensive VMM operations, with high‐precision weight updates accumulated in digital units. Memristor devices are only changed when the accumulated weight update value exceeds a pre‐defined threshold. The proposed scheme is implemented with a system‐onchip of fully integrated analog CIM modules and digital sub‐systems, showing fast convergence of LeNet training to 97.73%. The efficacy of training larger models is evaluated using realistic hardware parameters and verifies that CIM modules can enable efficient mix‐precision DNN training with accuracy comparable to full‐precision software‐trained models. Additionally, models trained on chip are inherently robust to hardware variations, allowing direct mapping to CIM inference chips without additional re‐training.
more » « less
Full Text Available
Columnar Learning Networks for Multisensory Spatiotemporal Learning

https://doi.org/10.1002/aisy.202200179

Yoo, Sangmin; Park, Yongmo; Wang, Ziyu; Wu, Yuting; Medepalli, Saaketh; Thio, Wesley; Lu, Wei D. (November 2022, Advanced Intelligent Systems)

Full Text Available
Spatiotemporal Spike Pattern Detection with Second-order Memristive Synapses

https://doi.org/10.1109/ISCAS48785.2022.9937414

Wu, Yuting; Yoo, Sangmin; Meng, Fan-Hsuan; Lu, Wei D. (May 2022, IEEE International Symposium on Circuits and Systems (ISCAS))

Full Text Available
Tuning Resistive Switching Behavior by Controlling Internal Ionic Dynamics for Biorealistic Implementation of Synaptic Plasticity

https://doi.org/10.1002/aelm.202101025

Yoo, Sangmin; Wu, Yuting; Park, Yongmo; Lu, Wei D. (February 2022, Advanced Electronic Materials)

Full Text Available
RRAM-enabled AI Accelerator Architecture

https://doi.org/10.1109/IEDM19574.2021.9720543

Wang, Xinxin; Wu, Yuting; Lu, Wei D. (December 2021, 2021 IEEE International Electron Devices Meeting (IEDM))

Full Text Available
Hierarchical architectures in reservoir computing systems

https://doi.org/10.1088/2634-4386/ac1b75

Moon, John; Wu, Yuting; Lu, Wei D (August 2021, Neuromorphic Computing and Engineering)

Abstract Reservoir computing (RC) offers efficient temporal data processing with a low training cost by separating recurrent neural networks into a fixed network with recurrent connections and a trainable linear network. The quality of the fixed network, called reservoir, is the most important factor that determines the performance of the RC system. In this paper, we investigate the influence of the hierarchical reservoir structure on the properties of the reservoir and the performance of the RC system. Analogous to deep neural networks, stacking sub-reservoirs in series is an efficient way to enhance the nonlinearity of data transformation to high-dimensional space and expand the diversity of temporal information captured by the reservoir. These deep reservoir systems offer better performance when compared to simply increasing the size of the reservoir or the number of sub-reservoirs. Low frequency components are mainly captured by the sub-reservoirs in later stage of the deep reservoir structure, similar to observations that more abstract information can be extracted by layers in the late stage of deep neural networks. When the total size of the reservoir is fixed, tradeoff between the number of sub-reservoirs and the size of each sub-reservoir needs to be carefully considered, due to the degraded ability of individual sub-reservoirs at small sizes. Improved performance of the deep reservoir structure alleviates the difficulty of implementing the RC system on hardware systems.
more » « less
Full Text Available

« Prev Next »

Search for: All records